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I.  INTRODUCTION 

Item  roponse  theory,  often  referred  to  as  latent-trait  theory,  has  provided 
the  tools  for  solving  t ho  problem  of  tailoring  a  test  to  the  individual. 
Traditionally,  the  same  test  is  given  to  all  individuals  regardless  of  the 
ability  level  of  the  individual  and  th  difficulty  level  of  the  test.  This 
mismatch  may  result  in  decreased  precision  of  measurement  which  may,  in  turn, 
lead  to  mi  sc  lass i fi cation ,  errors  of  selection,  poor  use  of  scarce  resources 
and  selection  of  individuals  who  are  ill-equipped  ‘o  perform  the  tasks  at  nand. 

The  development  of  latent-trait  theory  (see  lord  A  Novick,  19b.''.}  has  been  the 
latest  in  a  constant  trend  toward  making  human  aptitude  measurement  more 
precise  by  adaptiny  tests  to  examinees. 

As  early  as  the  beginning  of  the  twentieth  century,  Alfred  Binet  (see  Peterson, 
1926)  developed  adaptive  tests  for  educational  screening.  The  success  of  the 
group-administered  tests  developed  during  the  first  World  War,  coupled  with  the 
long  administration  time  of  the  Binet  tests,  changed  the  course  of  test  develop¬ 
ment  to  efforts  aimed  at  producing  the  more  economical  paper-and-penci 1  group- 
administered  non-adaptive  measurements  which  have  become  the  standard. 

The  advent  of  relatively  inexpensive  and  portable  computers  has  made  feasible 
computer-directed  adaptive  testing.  In  the  last  decade,  numerous  studies  have 
been  undertaken  in  an  attempt  to  accomplish  adaptive  measurement  using 
computers  (see  Weiss,  1977). 

Computers,  however,  are  prone  to  failures  at  unpredictable  times  and  are  still 
more  expensive  than  paper-and-penci 1  media.  This  effort,  therefore,  was 
designed  to  investigate  the  feasibility  of  developing  sophisticated  adaptive 
tests  which  do  not  rely  on  computer  administration  techniques.  Such  tests 
would  eliminate  the  need  for  costly  machines,  capture  the  advantages  of  latent- 
trait  theory,  and  be  as  portable  as  ordinary  test  booklets. 

II.  METHOD 

The  Adaptive  Test 

For  this  effort,  an  adaptive  test  was  defined  as  a  test  composed  of  several 
scorable  items  which  were  administered  sequentially,  so  that  the  item  presented 
was  based  on  the  results  of  the  preceding  question,  or  on  the  results  of  all 
the  preceding  questions.  In  an  adaptive  testing  environment,  the  examinee  is 
routed  from  item  to  item  so  that  not.  all  examinees  necessarily  answer  all 
questions  nor  necessarily  the  same  number  of  questions  (McBride,  1977). 

I  tom  Pools 

I  wo  adaptive  (imtent.  arise,,  Word  Knowledge  (Wh)  and  Arithmetic  Reasoning  (AR), 

.■e  re  used  for  the  adaptive  tests.  Using  the  maximum  lilelihood  procedure 
described  by  W inyersky  and  lord  (1971),  the  test  items  for  these  ((intent  areas 
no!  fern  r  ,il  ibrat*-')  on  a  sample  of  approximately  1  ,f>un  Air  Tore  e  recruits.  Each 
ability  area  was  calibrated  separately  using  the  three-parameter  loqistu. 
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those  were  selec  tod  tor  tryout  on  small  samples  of  Air  lone  baric  i<><  ruits 
to  refine  imii  edures  and  t  ei  nni  t  jnc*c, .  the  prototypes  wen1  designed  so  that 
rue  tti"  initial  inst  rur  t  ions  wore  given,  t.he  sub. jest  would  not.  roqu  i  re 
further  assistant. e  to  complete  t.hr  test. 

•voting  t.e  .t"  tallowed  (>y  a  'men suronerit  tost",  was  used  in  m  ft  prtitot 
These  procedures  resulted  in  a  two-, 'age  test  protocol.  Iwu  inethnds  of  routi' 
the  subject  f run  item  to  iter  were  used.  Tor  one  method,  all  subjects 
answered  all  items  in  the  first  stage  of  the  test.  Depending  on  their  pertorr 
ante  on  the  first  stage,  they  were  routed  fo  one  of  five  second-stage  tests. 

lor  Mi"  second  routing  method,  all  subjects  started  with  the  first  item  in 
*'i"  first  stage  of  the  test.  Depending  on  whether  their  response  was  correct 
or  incorrect,  subjects  were  routed  to  a  more  or  1  <  , s  difficult,  iter.  This 
same  pnu  edure  was  followed  for-  each  subsequent,  item  in  the  first  stage. 

The  sequences  of  items  answered  determined  t.he  level  of  the  test  to  be 
taken  at  tne  second  stage. 
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Prototype  II 


Prototype  II  (I'll)  consisted  of  a  set  nf  two  question  booklets  for  each  subtest. 

1  he  questions  for  the  first  part  of  each  subtest  were  presented  in  a  small, 
spiral-bound  booklet,  which  contained  tabbed  7.6?  x  12.70  cm  (.7  x  5-inch)  cards 
and  cover  pages.  The  questions  for  the  second  part  of  the  subtest  wore  printed 
in  a  booklet  21.52  x  27.94  cm  (8  1/2  x  11  inches).  The  examinees  were  referred 
to  the  appropriate  measurement  test  based  on  t he  directions  provided  on  a 
separate  one-page  instruction  sheet,  each  examinee  used  a  total  of  two  sets  of 
question  booklets  and  instruction  sheets  for  each  administration. 

T tie  answer  sheet  for  PI  I  was  scannable  and  had  invisible  numbers  and  marks 
precoded  in  the  response  areas.  T  iio  examinees  used  special  crayons  to  mark 
their  answers.  Use  of  these  crayons  revealed  the  previously  hidden  marks. 

One  27.94  x  43.18  cm  (11  x  17-inch)  answer  page  printed  on  both  sides  of  the 
paper  was  used  for  the  subtest. 

A  n anija 1  was  provided  for  the  administrator  to  explain  the  procedures  to  be 
'•-Unwed  in  I'll.  A  visual  aid  was  provided  to  aid  the  administrator  in 
explaining  the  routing  directions  for  PIT.  The  visual  aid  was  constructed 
to  illustrate  how  the  hidden  marks  were  to  be  revealed  on  the  answer  sheet 
to  respond  to  each  test  item. 

Prototype  III 

For  this  third  prototype  (Pill),  the  questions  were  presented  in  a  21.52  x  27.94 
cm  (8  1/2  x  11-inches)  booklet.  The  responses  were  recorded  by  the  examinees  on 
a  carbonless  transfer  answer-sheet  set.  Each  examinee  used  two  question  booklets 
and  carbonless  transfer  answer-sheet  sets.  Each  answer-sheet  set  was  specifically 
designed  to  correspond  to  a  particular  subtest. 


A  carbonless  transfer  answer-sheet  set  consisted  of  two  pages.  The  top  page 
was  a  machine-scannable  answer  sheet  that  was  spot. -glued  to  a  second  sheet 
of  paper.  The  reverse  side  of  the  machine-scannable  answer  sheet  was  covered 
with  a  block  pattern  to  inhibit  reading  of  the  second  sheet,  and  was  treated 
so  that  markings  made  on  the  answer  sheet  were  transferred  to  the  second 
page  of  the  set.  The  second  page  provided  the  examinees  with  instructions 
that  routed  them  to  the  appropriate  measurement  test  based  on  their  responses 
to  the  first  part  of  the  test. 

An  instruction  manual  for  Pill  was  provided  to  the  administrator.  Two  visual 
nids  were  used  by  the  administrator  to  explain  the  routing  scheme  for  Pill. 
Each  visual  aid  corresponded  to  one  page  of  the  answer-sheet  sot.  A  pen 
with  water-based  ink  was  provided  for  use  by  the  administrator  with  the  visual 
<i  i  ds . 


Pouting  Test  Development 

p.„  routing  test  for  Prototypes  I  and  II  (PI  and  PH)  directed  the  examinee 
from  item  to  item  depending  on  the  response  to  t fie  previous  item.  A  maximum 
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information  i  tein-se  lection  procedure  was  used  for  t  he',  e  two  mutiny  tests 
(Sympson,  1977).  Items  which  maximized  the  i tom- i nform.it ion  function 
(Birnbaum,  1968)  at  the  estimated  ability  level,  u,  were  selected  after  each 
item  was  answered.  Fourteen  items  were  available  in  each  of  these  tests. 
Figure  1  shows  the  possible  paths  through  the  items. 

!  t  em 


Figure  1.  Paths  through  the  routing  tests  for  PI  and  I’ll.  (Numbers  indicate 

items;  and  +  and  -  indicate  correct  and  incorrect  responses,  respecti  vely .) 

The  routing  test  for  Prototype  III  (Pill)  was  a  short:  peaked  measure  of 
ability.  There  were  eight  items  used  in  the  Arithmetic  Reasoning  test  and 
10  items  used  in  the  Word  Knowledge  test. 

Design  of  Administration  Instructions 

The  administration  instructions  were  prepared  as  integral  parts  of  the  proto¬ 
types.  The  test  administrators  were  only  to  be  available  to  reinforce  these 
instructions  or  to  answer  appropriate  questions. 

The  instructions  were  tried  out  with  a  number  of  volunteers  whose  ages  ranged 
from  nine  years  through  adult  and  whose  educational  levels  ranged  from  fourth 
grade  through  graduate  school.  On  the  basis  of  those  pro-experimental  trials, 
changes  were  made  to  the  instructions  in  the  prototypes  and  to  the  adminis¬ 
tration  instructions.  Instructions  for  the  practiie  sessions  amt  the  special 
visual  aids  appropriate  to  each  prototype  were  developed  and  refined.  The 
administrators  were  trained  in  the  use  of  these  materials. 

Field  Test 

A  total  of  711  airmen  participated  in  the  field  test.  I ach  took  the  Word 
Knowledge  ( WK )  and  Arithmetic  Reasoning  (AR)  subtests  from  the  Armed  Service's 
Vocational  Aptitude  Battery  (ASVAB),  as  well  as  the  adaptive  WK  and  AR  tests. 

In  addition,  enlistment  qualification  scores  (scores  of  record)  on  the 
Mechanical,  Administrative,  General,  and  Electronics  (M,A,G,E)  composites  of 
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the  ASVAB,  as  well  as  the  composite  known  as  the  Armed  Forces  Qualification 
Test  (AFQF),  were  available  for  every  subject.  Other  demographic  data  were 
also  collected. 

Instructional  manuals  were  prepared  for  use  by  the  administrators  in  assign¬ 
ment  of  subjects  to  prototype  and  subtest.  At  least  40  subjects  were  tested 
at  each  session.  If  the  administrators  encountered  any  problems  at  any  of 
the  sessions,  they  were  asked  to  record  these  problems  and  resolutions  in 
the  manuals  for  review  by  the  contractor.  The  initial  day  of  administration 
was  observed  by  the  researchers. 

For  the  field  tryout  of  the  prototypes,  a  practice  test  and  an  actual  test 
were  administered.  Half  of  the  subjects  were  randomly  assigned  to  the  WK 
adaptive  tests  and  half  were  assigned  the  AR  adaptive  tests  for  the  practice 
test,  for  the  actual  testing  session  the  assignment  of  subjects  to  an 
adaptive  test  were  reversed.  Those  subjects  who  were  assigned  the  WK  adaptive 
tost  for  the  practice  session  took  the  AR  adaptive  test  during  the  actual 
testing  session  and  vice  versa.  Thus,  for  each  testing  session,  tv/o  adaptive 
tests  were  administered  to  each  subject,  one  for  practice  and  one  for  actual 
scoring. 

Ability  estimation  in  the  routing  test  for  PI  and  PII  were  determined  from 
maximum-likelihood  estimates  of  ability  for  each  of  the  32  possible  combinations 
of  right  and  wrong  answers. 

The  routing  test  of  Pill  was  designed  so  that  all  examinees  took  all  items. 

These  items  wore  arranged  within  a  short  band  and  produced  a  peaked-test 
information  function.  The  resultant  ability  estimate  was  used  to  route 
examinees  to  the  appropriate  measurement  test. 

Measurement  Test  Development 

The  measurement  tests  for  PI  and  PII  were  the  same.  The  medium  for  adminis¬ 
tration  of  each  prototype  differed.  The  tests  were  developed  to  provide 
maximum  measurement  pre-.  ision  within  a  relatively  narrow  range.  This  range 
was  determined  by  the  resultant  0  from  the  routing  test.  In  order  to  ensure 
adequate  coverage  of  the  ability  continuum,  the  measurement  test  information 
functions  were  carefully  designed  to  overlap.  Figure  2  represents  the  model. 
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Figure  2.  Overlapping  information  functions  for  measurement  tests. 
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fur  I'll!  wi'i'c  (.oust  i  tutod  in  much  the  same  manner  as 
that  cutting  points  were  based  on  the  number  right 
through  (>  show  the  actual  information  functions  for 
for  all  prototypes  for  both  aptitude  areas. 


III.  RESULTS 


.  m;  :■  ,  •  r;  . *  k fur  age  ,nnl  non-adaptive  WK  and  AR  test  scores  v/ere 

'■  :  •  r  ".e  sub  iiH.ts.  Table  1  presents  these  statistics  for  the  entire 

•  ■  <  *  ••  was  male  and  25  percent  female.  Table  ?.  shov/s 

•  i, d  i  ’  i  *  i  scores,  ,  obtained  by  subjects  for  each  prototype. 

■  ,"iv; i,  ■,  Ae  re  umputed  for  all  the  variables.  Tables  3,  4,  and  5  show 
t'.e  n.-r  is'  i  ,i  t  ions  for  all  variables  for  Pi,  P 1 1 ,  and  Pill. 


A  .  test  was  compuNd  (Edwards,  195b)  to  determine  if  there  were  differences 
be* ween  the  correlation  of  the  paper-and-penci 1  tests  with  AFQT  and  the  like- 
named  adaptive  tests  for  AEQT .  In  no  case  were  the  differences  significant 
at  the  predetermined  p  <.05  level. 

The  time  repaired  to  complete  the  adaptive  tests  was  recorded.  A5VAB  admin¬ 
istrative  tines  are  fixed.  Table  6  displays  a  description  of  the  time  required 
to  complete  both  types  of  tests. 

The  subjects  also  were  questioned  as  to  their  perceptions  of  the  adaptive  tests 
as  compared  to  traditional  paper-and-penci 1  tests.  Table  7  presents  a  summary 
of  their  responses. 


IV.  DISCUSSION 

Three  prototype  methods  were  developed  to  test  the  efficacy  of  the  use  of 
paper-and-penci 1  adaptive  tests.  Routing  of  the  examinees  through  the  test 
was  accomplished  by  one  of  two  procedures.  In  one  routing  procedure,  the 
examinees  were  routed  from  item  to  item,  depending  on  their  answers  to  pre¬ 
vious  items.  The  sequence  of  items  answered  determined  the  second-stage 
level  of  testing.  The  second  routing  procedure  provided  for  all  the  examinees 
to  answer  the  same  items  in  the  first-stage  test.  The  number  of  correct 
responses  in  the  first  stage  determined  the  second-stage  level  of  testing. 

'wo  subtests  (Arithmetic  Reasoning  and  Word  Knowledge)  were  administered 
to  each  examinee  in  a  counter-balanced  design:  one  for  practice  and  one  for 
the  actual  test.  The  items  for  these  subtests  were  selected  from  item  pools 
provided  by  the  Air  Force  Human  Resources  Laboratory.  ASVAB  subtests  in  the 
same  areas  were  also  administered  to  each  examinee.  Examinees  participated 
as  subjects  for  one  of  three  prototypes.  These  data  were  correlated  with  the 
ASVAB  subtest  score  of  the  same  name,  and  enlistment  qualification  composites 
obtained  from  existing  records. 

The  results  of  the  analyses  showed  that  the  prototype  methods  were  successful. 
There  was  a  high  correlation  between  the  ability  estimates  of  the  examinees 
on  the  subtests  within  each  prototype  and  their  scores  on  corresponding  ASVAB 
subtests.  Significance  tests  indicated  that  these  observed  correlations  did 


V 


Table  1 

Descriptive  Statistics  Ape  and  Test  Scores*  for  Subjects 
(N  -  711) 


Variable 

Mean 

S t  andard 
(icviat  ion 

Skew 

Kurtosis 

A' ^jOcirs 

/ 

20.50 

2. 11 

1 . 10 

.08 

Al  pi 

64 .  !JB 

15.11 

.32 

-  .45 

M 

61.29 

25.05 

-  .Ob 

-  .90 

A 

09 .  7  / 

15.17 

-  .66 

-  .02 

G 

72.66 

15.1 6 

-  .30 

-  .80 

L 

71.72 

17.62 

-  .75 

-  .03 

ASVAB-WK 

22.57 

4.92 

-  .48 

-  .46 

as  v  ar 

13. 'JO 

3.51 

-  .03 

-  .67 

*  Am';;,  v. 

A, 

G  and  f  at 

■  ■  ri-portej  in  j 

aorctnt)  le  epuivalonts 

while  KK 

and 

mi\  <i  ro  ro 

ported  in  nui:;ber  r i tj h t~ score. 

Table  2 

rr i p t.i ve  '•!  if  is  t  its  for'  Word 
Reasoning  Adaptive  iests. 

Knowledge  and 

At'i  timet' ic 

Prototype 

Apti tude 

Mean 

‘i*  at.dat  i 

1  "v  i t  i  on 

N 

i 

A  \{ 

-.2  5 

.  79 

111 

I 

Wb 

#  M  1 

1.07 

73 

1 1 

AR 

-.  1  i 

.  76 

117 

li 

UK 

« « » * 

.87 

120 

ill 

AR 

-.07 

.84 

104 

1 1  i 

WK 

.21 

.85 

67 
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Inter.  orrelat  ion*.  *  of  "I'.T,  A(|e,  bex,  and  Test  Score 
Var  i  ab  1(’S  f  nr  Prototype  Ill. 


FQT 

.  IS 

NTS** 

.51 

.51 

.55 

.84 

.75 

.68 

.51 

r.T 

-  .03 

\ 

X 

.18 

.05 

.21 

.  14 

.07 

.15 

.14 

IX** 

X 

X 

S\ 

X 

X 

X 

X 

X 

X 

X 

.60 

.06 

l 

.27 

.44 

.46 

.43 

.25 

.73 

.50 

-.10 

X 

.40 

.05 

.35 

.63 

.39 

.32 

.38 

.24 

X 

.36 

.11 

\ 

.63 

.30 

.32 

.50 

.89 

.02 

X 

.73 

.50 

.35 

\ 

.57 

.77 

.53 

.87 

-.10 

X 

.54 

.70 

.32 

.81 

\ 

.42 

.42 

.70 

.06 

X 

.85 

.41 

.40 

.72 

.59 

.32 

.74 

.02 

X 

.54 

.43 

.51 

.76 

.74 

.59 

*!ntri»*s  abnv”  diannna)  are  for  Ar i  t.  timet ic  i’oasonin.)~  adaptive 
t  es  t ,  r  ,  and  those  below  arc  tnr  the  Word  Know  led-ie  adaptive  test 


*No  female  subjects. 

Table  t"> 

Mean  and  Standard  Deviation  of  Test  Administration  Times. 
Test  Mean  T ime  Standard  Deviation 


♦ASVAB  tests  of  AR  and  WK  are  fixed  time 


Responses  to  Adaptive  Versus  Linear  Test 


t'-ple  27  6 
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not  differ.  The  adaptive  tests  and  the  linear  tests  appear  to  be  measuring 
the  same  aptitude. 

Savings  were  obtained  in  the  average  time  required  to  complete  the  adaptive 
tests  as  compared  to  the  conventional  paper-and-penc i 1  test.  The  Arithmetic 
Reasoning  (AR)  subtest  and  the  Word  Knowledge  (WK)  subtest  represent  the 
item  types  which  usually  require  the  most  and  least,  time  per  item  to  admin¬ 
ister,  respectively.  Reduction  in  AR  time  was  about  66  percent  of  the  usual 
required  time,  while  WK  time  was  reduced  to  less  than  half  the  usual  time. 

A  fully  adaptive  battery  could  be  expected  to  allow  for  an  increase  of  six 
subtests  given  in  the  same  time  required  to  administer  forms  6  and  7  of  the 
ASVAB.  This  would  provide  superior  measurement  by  enabling  more  data  to  be 
collected  on  each  examinee.  Reduction  in  classification  decision  errors 
would  devolve  from  this  additional  information. 

f  xuminees  responses  to  the  questions  on  perceptions  about  tho  u  ,<•  of  adaptive 
testing  prototypes  were  generally  favorable,  as  has  been  found  elsewhere 
(i’restwood  &  Weiss,  1971',).  These  methods  allowed  them  to  be  tested  at  their 
own  level  of  ability  and  to  proceed  at  their  own  rate.  In  addition,  many 
felt  that  this  kind  of  testing  was  easier  than  traditional  testing  because 
there  were  fewer  items  to  answer,  and  the  test  taking  was  less  fatiguing  than 
traditional  methods. 

This  effort  provides  a  successful  demonstration  that  adaptive  testing  can  be 
conducted  without  the  use  of  expensive  computers,  further  exploration  and 
development  with  other  aptitude  areas  and  with  a  traditional  criterion  will 
have  to  bo  accomplished  before  any  long-range  decisions  are  made  about  the 
general  implementation  of  these  methods  in  the  Armed  forces  testing  program. 
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Ihie  to  norming  problems  encountered  with  ASVAB  Forms  5.  6,  and  7.  pereenlile  scores  derived  from 
these  lest  forms  are  in  error.  While  the  relative  ranking  of  individuals  by  their  pereenlile  seores  would  not 
be  affected  by  the  norining  errors,  their  absolute  score  values  would  be  different.  Therefore,  descriptive 
statistics  reported  m  the  subject  technical  reports  above  are  erroneous:  other  types  of  analyses  in  the 
report  which  use  ASVAB  percentile  scores  should  be  interpreted  with  caution. 
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